Goto

Collaborating Authors

 parameter vector


Axioms for AI Alignment from Human Feedback

Neural Information Processing Systems

In the context of reinforcement learning from human feedback (RLHF), the reward function is generally derived from maximum likelihood estimation of a random utility model based on pairwise comparisons made by humans. The problem of learning a reward function is one of preference aggregation that, we argue, largely falls within the scope of social choice theory. From this perspective, we can evaluate different aggregation methods via established axioms, examining whether these methods meet or fail well-known standards. We demonstrate that both the Bradley-Terry-Luce Model and its broad generalizations fail to meet basic axioms. In response, we develop novel rules for learning reward functions with strong axiomatic guarantees. A key innovation from the standpoint of social choice is that our problem has a linear structure, which greatly restricts the space of feasible rules and leads to a new paradigm that we call linear social choice .



Fast Sparse Group Lasso

Yasutoshi Ida, Yasuhiro Fujiwara, Hisashi Kashima

Neural Information Processing Systems

However,asan update ofonlyoneparameter group depends onalltheparameter groups ordata points, the computation cost is high when the number of the parameters or data points islarge. This paper proposes afast Block Coordinate Descent for Sparse GroupLasso.




UnderstandingtheEffectofStochasticity inPolicyOptimization

Neural Information Processing Systems

Until recently it had generally been assumed thatmethods based onfollowingthepolicygradient (PG)[1]could notbeguaranteed toconverge to globally optimal solutions, given that the policy value function is not concave.



Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference

McQuarrie, Shane A., Guo, Mengwu, Chaudhuri, Anirban

arXiv.org Machine Learning

Numerical simulation of complex physical phenomena is a core enabling technology for digital twins, which are comprised of physical and virtual assets with a two-way flow of information: data from the physical asset is used to construct and/or calibrate the virtual asset (a numerical model), while numerical predictions from the virtual asset are used for control or decision-making for the physical asset [42]. To be viable for practical application, the virtual asset must be able to produce predictions rapidly and reliably; however, the underlying physics that are of interest for digital twin applications can typically only be accurately simulated using a large number of degrees of freedom, leading to computationally expensive numerical simulations. The explainability and computational efficiency of decisions made by the digital twin play a key role in safety-critical applications, making explainable artificial intelligence an essential ingredient [24]. Model reduction techniques are one such explainable scientific machine learning technique that construct low-dimensional systems, called reduced-order models (ROMs), to serve as computationally inexpensive surrogates for a high-dimensional physics simulation [4, 20]. This paper introduces a technique for adaptively constructing ROMs to emulate systems with parametric dependence, that is, systems whose behavior varies with some set of parameters, usually representing physical properties. We focus on systems where the parametric dependence manifests in the operators defining the model, not merely in initial conditions or external inputs.


Fast Iterative Hard Thresholding Methods with Pruning Gradient Computations

Neural Information Processing Systems

We accelerate the iterative hard thresholding (IHT) method, which finds (k) important elements from a parameter vector in a linear regression model. Although the plain IHT repeatedly updates the parameter vector during the optimization, computing gradients is the main bottleneck. Our method safely prunes unnecessary gradient computations to reduce the processing time.The main idea is to efficiently construct a candidate set, which contains (k) important elements in the parameter vector, for each iteration. Specifically, before computing the gradients, we prune unnecessary elements in the parameter vector for the candidate set by utilizing upper bounds on absolute values of the parameters. Our method guarantees the same optimization results as the plain IHT because our pruning is safe. Experiments show that our method is up to 73 times faster than the plain IHT without degrading accuracy.


Fair Sparse Regression with Clustering: An Invex Relaxation for a Combinatorial Problem

Neural Information Processing Systems

In this paper, we study the problem of fair sparse regression on a biased dataset where bias depends upon a hidden binary attribute. The presence of a hidden attribute adds an extra layer of complexity to the problem by combining sparse regression and clustering with unknown binary labels. The corresponding optimization problem is combinatorial, but we propose a novel relaxation of it as an invex optimization problem. To the best of our knowledge, this is the first invex relaxation for a combinatorial problem. We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance. Rather, it enables the recovery of the hidden attribute.